Maximum Likelihood Principle

The goal of maximum likelihood is to fit a distribution to some data.
Using the Bayes Theorem, we want to find the most likely value for the parametetrs of our model, given the data.

argmaxP(θ|X)=argmaxP(X|θ)P(θ)P(X)Where:

  • P(θ) is called the prior
  • P(X|θ) is called the likelihood, which is not really a probability
  • P(θ|X) is called the posterior

The likelyhood is equal to the probability density function of a gaussian, if we assume that the data was generated by a gaussian distribution.

Now the real stuff lol

We basically bruteforce fit a gaussian distribution on the data and then we get the one that maximizes the likelyhood function.

So we begin by fitting the gaussians, we start with μ=28 and σ=2:

Which yields a likelyhood of:
L(μ,Σ;x)=1(2π)D/21|Σ|1/212(xμ)TΣ1(xμ)=0.03

But we can do better right?
In fact, if we plug in μ=30 and σ=2:

We get a likelyhood of 0.12, which is definitely better!

By the way, if we plot the likelyhood hovering all over the possible values of μ, we can actually see that we can get the maximum likelyhood where the derivative of the whole thing is 0:

If we have multiple data points, the likelihood function will be the product of all the gaussians/individual likelyhood functions that are generated from the data points.

p(D;μ,Σ)=p({x1,,xN};μ,Σ)=p(x1;μ,Σ)p(x2;μ,Σ)p(xN;μ,Σ)p(D;μ,Σ)=i=1Np(xi;μ,Σ)=i=1NN(xi;μ,Σ)

So we take the derivative of this shit with respect to μ and we actually find the Maximum Likelihood parameters.

Tldr

In order to get the maximum likelihood parameters for multiple data points, we must multiply all the individual likelihood functions and take the derivative of that, solving for μ and σ.